Skip to content

Revising check_model()#698

Open
strengejacke wants to merge 28 commits into
mainfrom
strengejacke/issue697
Open

Revising check_model()#698
strengejacke wants to merge 28 commits into
mainfrom
strengejacke/issue697

Conversation

@strengejacke
Copy link
Copy Markdown
Member

@strengejacke strengejacke commented Mar 18, 2024

@strengejacke
Copy link
Copy Markdown
Member Author

@bwiernik and @mccarthy-m-g - I think the implementation works quite well now for the first methods. The plot() method works fine for the Q-Q plots (easystats/see#329). Things we should consider:

  • We would have to update the plot for overdispersion/zero-inflation checks (see DHARMa implementation for new check_residuals() function #643 (comment)). This is still based on the classical residuals, not the simulated ones. I have played around with revising the current code, copying the function .diag_overdispersion into a new .new_diag_overdispersion (see

    .new_diag_overdispersion <- function(model, ...) {
    ). This did not really work - do you have any ideas how we can have new plots for overdispersion/zero-inflation? I found these plots quite informative and would like to keep them, beside the Q-Q plot. See also discussion here.

  • check_zeroinflation() and check_overdispersion() now rely on simulate_residuals() for zero-inflated or negative binomial models etc., so only the really "simple" models that returned identical results to DHARMa-tests still use the old code. Only Poisson mixed models also use the "old" code, which still could be inaccurate (see check_overdispersion underestimates dispersion in mixed models #464) - do we want to use simulate_residuals() in general for mixed models when calling check_zeroinflation() and check_overdispersion()?

  • How do we want to deal with y axis limits when detrend = TRUE in Q-Q plots? (see DHARMa implementation for new check_residuals() function #643 (comment))

  • A method for check_autocorrelation() is not yet implemented. Since the DHARMa tests require additional information, I'm not sure about how to best implement such methods.

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 18, 2024

Codecov Report

Attention: Patch coverage is 12.50000% with 7 lines in your changes missing coverage. Please review.

Project coverage is 57.73%. Comparing base (6e1eb60) to head (2be53b8).
Report is 3 commits behind head on main.

Files Patch % Lines
R/check_model_diagnostics.R 12.50% 7 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #698      +/-   ##
==========================================
+ Coverage   57.69%   57.73%   +0.03%     
==========================================
  Files          87       87              
  Lines        6444     6464      +20     
==========================================
+ Hits         3718     3732      +14     
- Misses       2726     2732       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@strengejacke
Copy link
Copy Markdown
Member Author

strengejacke commented May 25, 2026

Results for overdispersion plots from implementation in #913 and CRAN version

library(lme4)
#> Loading required package: Matrix
library(glmmTMB)
library(dplyr) ## for mutate_at, %>%
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(performance)

#Build example data
x <- c("A", "B", "C", "D")
time <- rep(x, each=20, times=3) #time factor
y <- c("exposed", "ref1", "ref2")
lake <- rep (y, each=80)  #lake factor
set.seed(123)
min <- runif(n=240, min=4.5, max=5.5) #mins used in model offset
set.seed(123)
count <- rnbinom(n=240,mu=10,size=100) #randomly generated negative binomial data

#make data frame
dat <- as.data.frame(cbind(time, lake, min, count)) 
dat <- dat |> 
   mutate_at(c('min', 'count'), as.numeric)

#remove one combination of factors to make example rank deficient (all observations from time A and lake ref1)
dat2 <- filter(dat, time!="A" | lake !="ref1")

model <-glmmTMB(count~time*lake, family=nbinom1,
                      control = glmmTMBControl(rank_check = "adjust"),
                      offset=log(min), data=dat2)
#> dropping columns from rank-deficient conditional model: timeD:lakeref1
check_overdispersion(model) |> plot()

PR 919

CRAN version

image
set.seed(3)
mu <- rpois(500, lambda = 3)
x <- rnorm(500, mu, mu * 3) |> ceiling() |> pmax(0)

quine.nb1 <- MASS::glm.nb(x ~ mu)
check_overdispersion(quine.nb1) |> plot()
#> `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
#> `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

PR 913

CRAN version

image
set.seed(101)
d <- data.frame(x = runif(1000), f = factor(sample(1:200, size = 1000, replace = TRUE))) # modified for more random effects
suppressMessages(
  d$y <- simulate(
    ~ x + (1 | f),
    family = poisson,
    newdata = d,
    newparams = list(theta = 1, beta = c(0, 2))
  )[[1]]
)
m1 <- glmer(y ~ x + (1 | f), data = d, family = poisson)
check_overdispersion(m1) |> plot()

PR 913

CRAN version

image
docvisit <- datawizard::data_read("~/../Downloads/docvisit.txt")

mp <- glmmTMB(
  doctorco ~ sex + illness + income + hscore, 
  data = docvisit,
  family = poisson()
)
check_overdispersion(mp) |> plot()

PR 913

CRAN version

image
mnb <- glmmTMB(
  doctorco ~ sex + illness + income + hscore,
  data = docvisit,
  family = nbinom2()
)
check_overdispersion(mnb) |> plot()

PR 913

CRAN version

image
mzip <- glmmTMB(
  doctorco ~ sex + illness + income + hscore,
  ziformula = ~age,
  data = docvisit,
  family = poisson()
)
check_overdispersion(mzip) |> plot()

PR 913

CRAN version

image
mzinb <- glmmTMB(
  doctorco ~ sex + illness + income + hscore,
  ziformula = ~age,
  data = docvisit,
  family = nbinom2()
)
check_overdispersion(mzinb) |> plot()

PR 913

CRAN version

image
mzinbd <- glmmTMB(
  doctorco ~ sex + illness + income + hscore + age,
  ziformula = ~ sex + illness + income + hscore + age,
  dispformula = ~ sex + illness + income + hscore + age,
  data = docvisit,
  family = nbinom2()
)
check_overdispersion(mzinbd) |> plot()

PR 913

CRAN

image

Created on 2026-05-25 with reprex v2.1.1

@strengejacke

This comment was marked as outdated.

@strengejacke
Copy link
Copy Markdown
Member Author

Most plots are very similar, but I think for some plots, #913 is an improvement. I don't see any disadvantages for PR #913

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Revising check_model()

3 participants